Paraphrase type identification for plagiarism detection using contexts and word embeddings
نویسندگان
چکیده
Abstract Paraphrase types have been proposed by researchers as the paraphrasing mechanisms underlying acts of plagiarism. Synonymous substitution, word reordering and insertion/deletion identified some common strategies used plagiarists. However, similarity reports generated most plagiarism detection systems provide a score produce matching sections text with their possible sources. In this research we propose methods to identify two important paraphrase – synonymous substitution in paraphrased, plagiarised sentence pairs. We three staged approach that uses context pretrained embeddings for identifying reordering. Our indicates use Smith Waterman Algorithm Plagiarism Detection ConceptNet Numberbatch produces best performance terms $$\hbox {F}_1$$ F 1 scores. This can be complement currently available incorporating detection.
منابع مشابه
Detecting Cross-Lingual Plagiarism Using Simulated Word Embeddings
Cross-lingual plagiarism (CLP) occurs when texts written in one language are translated into a different language and used without acknowledging the original sources. One of the most common methods for detecting CLP requires online machine translators (such as Google or Microsoft translate) which are not always available, and given that plagiarism detection typically involves large document com...
متن کاملClickbait detection using word embeddings
Clickbait is a pejorative term describing web content that is aimed at generating online advertising revenue, especially at the expense of quality or accuracy, relying on sensationalist headlines or eyecatching thumbnail pictures to attract click-throughs and to encourage forwarding of the material over online social networks. We use distributed word representations of the words in the title as...
متن کاملMethods for Detecting Paraphrase Plagiarism
Paraphrase plagiarism is one of the difficult challenges facing plagiarism detection systems. Paraphrasing occur when texts are lexically or syntactically altered to look different, but retain their original meaning. Most plagiarism detection systems (many of which are commercial based) are designed to detect word co-occurrences and light modifications, but are unable to detect severe semantic ...
متن کاملParaphrase Identification Using Weighted Dependencies and Word Semantics
We present in this article a novel approach to the task of paraphrase identification. The proposed approach quantifies both the similarity and dissimilarity between two sentences. The similarity and dissimilarity is assessed based on lexico-semantic information, i.e., word semantics, and syntactic information in the form of dependencies, which are explicit syntactic relations between words in a...
متن کاملDifferent Contexts Lead to Different Word Embeddings
Recent work for learning word representations has applied successfully to many NLP applications, such as sentiment analysis and question answering. However, most of these models assume a single vector per word type without considering polysemy and homonymy. In this paper, we present an extension to the CBOW model which not only improves the quality of embeddings but also makes embeddings suitab...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International journal of educational technology in higher education
سال: 2021
ISSN: ['2365-9440']
DOI: https://doi.org/10.1186/s41239-021-00277-8